Less is More: Compact Matrix Decomposition for Large Sparse Graphs

نویسندگان

  • Jimeng Sun
  • Yinglian Xie
  • Hui Zhang
  • Christos Faloutsos
چکیده

Given a large sparse graph, how can we find patterns and anomalies? Several important applications can be modeled as large sparse graphs, e.g., network traffic monitoring, research citation network analysis, social network analysis, and regulatory networks in genes. Low rank decompositions, such as SVD and CUR, are powerful techniques for revealing latent/hidden variables and associated patterns from high dimensional data. However, those methods often ignore the sparsity property of the graph, and hence usually incur too high memory and computational cost to be practical. We propose a novel method, the Compact Matrix Decomposition (CMD), to compute sparse low rank approximations. CMD dramatically reduces both the computation cost and the space requirements over existing decomposition methods (SVD, CUR). Using CMD as the key building block, we further propose procedures to efficiently construct and analyze dynamic graphs from real-time application data. We provide theoretical guarantee for our methods, and present results on two real, large datasets, one on network flow data (100GB trace of 22K hosts over one month) and one on DBLP (200MB over 25 years). We show that CMD is often an order of magnitude more efficient than the state of the art (SVD and CUR): it is over 10X faster, but requires less than 1/10 of the space, for the same reconstruction accuracy. Finally, we demonstrate how CMD is used for detecting anomalies and monitoring timeevolving graphs, in which it successfully detects worm-like hierarchical scanning patterns in real network data.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Large-scale Inversion of Magnetic Data Using Golub-Kahan Bidiagonalization with Truncated Generalized Cross Validation for Regularization Parameter Estimation

In this paper a fast method for large-scale sparse inversion of magnetic data is considered. The L1-norm stabilizer is used to generate models with sharp and distinct interfaces. To deal with the non-linearity introduced by the L1-norm, a model-space iteratively reweighted least squares algorithm is used. The original model matrix is factorized using the Golub-Kahan bidiagonalization that proje...

متن کامل

Tree-decompositions with bags of small diameter

This paper deals with the length of a Robertson–Seymour’s tree-decomposition. The tree-length of a graph is the largest distance between two vertices of a bag of a tree-decomposition, minimized over all tree-decompositions of the graph. The study of this invariant may be interesting in its own right because the class of bounded tree-length graphs includes (but is not reduced to) bounded chordal...

متن کامل

Graph Clustering by Hierarchical Singular Value Decomposition with Selectable Range for Number of Clusters Members

Graphs have so many applications in real world problems. When we deal with huge volume of data, analyzing data is difficult or sometimes impossible. In big data problems, clustering data is a useful tool for data analysis. Singular value decomposition(SVD) is one of the best algorithms for clustering graph but we do not have any choice to select the number of clusters and the number of members ...

متن کامل

Less is More: Sparse Graph Mining with Compact Matrix Decomposition

It is the new, multi-disciplinary journal that has already gained more than 1,000 institutional subscribers across six continents! It has clearly captured the interest of researchers in a wide range of disciplines and from around the world. Moreover, it represents not only an opportunity for you to be a part of this groundbreaking journal, but also to showcase your important findings to the world.

متن کامل

Solving the maximum edge-weight clique problem in sparse graphs with compact formulations

This paper proposes compact formulations for the maximum edge-weight clique (MEWC) problem on sparse graphs. The MEWC problem has long been discussed in the literature, but mostly addressing complete graphs, with or without a cardinality constraint on the clique. Yet, several realworld applications are defined on sparse graphs, where the missing edges are due to some thresholding process or bec...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007